58 research outputs found
Scalable Loss-calibrated Bayesian Decision Theory and Preference Learning
Bayesian decision theory provides a framework for optimal action
selection
under uncertainty given a utility function over actions and
world
states and a distribution over world states. The application of
Bayesian
decision theory in practice is often limited by two problems:
(1)
in application domains such as recommendation, the true utility
function
of a user is a priori unknown and must be learned from user
interactions; and (2) computing expected utilities under complex
state
distributions and (potentially uncertain) utility functions is
often
computationally expensive and requires tractable approximations.
In this thesis, we aim to address both of these problems. For
(1),
we take a Bayesian non-parametric approach to utility function
modeling
and learning. In our first contribution, we exploit community
structure
prevalent in collective user preferences using a Dirichlet
Process
mixture of Gaussian Processes (GPs). In our second contribution,
we
take the underlying GP preference model of the first
contribution
and show how to jointly address both (1) and (2) by sparsifying
the
GP model in order to preserve optimal decisions while ensuring
tractable
expected utility computations. In our third and final
contribution,
we directly address (2) in a Monte Carlo framework by deriving
an
optimal loss-calibrated importance sampling distribution and
show
how it can be extended to uncertain utility representations
developed
in the previous contributions.
Our empirical evaluations in various applications including
multiple preference learning problems using synthetic and real
user
data and robotics decision-making scenarios derived from actual
occupancy
grid maps demonstrate the effectiveness of the theoretical
foundations laid in this thesis and pave the way for future
advances
that address important practical problems at the intersection of
Bayesian
decision theory and scalable machine learning
Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup
Mixup is a highly successful technique to improve generalization of neural
networks by augmenting the training data with combinations of random pairs.
Selective mixup is a family of methods that apply mixup to specific pairs, e.g.
only combining examples across classes or domains. These methods have claimed
remarkable improvements on benchmarks with distribution shifts, but their
mechanisms and limitations remain poorly understood.
We examine an overlooked aspect of selective mixup that explains its success
in a completely new light. We find that the non-random selection of pairs
affects the training distribution and improve generalization by means
completely unrelated to the mixing. For example in binary classification, mixup
across classes implicitly resamples the data for a uniform class distribution -
a classical solution to label shift. We show empirically that this implicit
resampling explains much of the improvements in prior work. Theoretically,
these results rely on a regression toward the mean, an accidental property that
we identify in several datasets.
We have found a new equivalence between two successful methods: selective
mixup and resampling. We identify limits of the former, confirm the
effectiveness of the latter, and find better combinations of their respective
benefits
Learning And Optimization Of The Kernel Functions From Insufficiently Labeled Data
Amongst all the machine learning techniques, kernel methods are increasingly becoming
popular due to their efficiency, accuracy and ability to handle high-dimensional
data. The fundamental problem related to these learning techniques is the selection of
the kernel function. Therefore, learning the kernel as a procedure in which the kernel
function is selected for a particular dataset is highly important. In this thesis, two approaches
to learn the kernel function are proposed: transferred learning of the kernel
and an unsupervised approach to learn the kernel. The first approach uses transferred
knowledge from unlabeled data to cope with situations where training examples are
scarce. Unlabeled data is used in conjunction with labeled data to construct an optimized
kernel using Fisher discriminant analysis and maximum mean discrepancy. The
accuracy of classification which indicates the number of correctly predicted test examples
from the base kernels and the optimized kernel are compared in two datasets
involving satellite images and synthetic data where proposed approach produces better
results. The second approach is an unsupervised method to learn a linear combination
of kernel functions
Soccer event detection via collaborative multimodal feature analysis and candidate ranking
This paper presents a framework for soccer event detection through collaborative analysis of the textual, visual and aural modalities. The basic notion is to decompose a match video into smaller segments until ultimately the desired eventful segment is identified. Simple features are considered namely the minute-by-minute reports from sports websites (i.e. text), the semantic shot classes of far and closeup-views (i.e. visual), and the low-level features of pitch and log-energy (i.e. audio). The framework demonstrates that despite considering simple features, and by averting the use of labeled training examples, event detection can be achieved at very high accuracy. Experiments conducted on ~30-hours of soccer video show very promising results for the detection of goals, penalties, yellow cards and red cards
Unshuffling Data for Improved Generalization
Generalization beyond the training distribution is a core challenge in
machine learning. The common practice of mixing and shuffling examples when
training neural networks may not be optimal in this regard. We show that
partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple
training environments can guide the learning of models with better
out-of-distribution generalization. We describe a training procedure to capture
the patterns that are stable across environments while discarding spurious
ones. The method makes a step beyond correlation-based learning: the choice of
the partitioning allows injecting information about the task that cannot be
otherwise recovered from the joint distribution of the training data. We
demonstrate multiple use cases with the task of visual question answering,
which is notorious for dataset biases. We obtain significant improvements on
VQA-CP, using environments built from prior knowledge, existing meta data, or
unsupervised clustering. We also get improvements on GQA using annotations of
"equivalent questions", and on multi-dataset training (VQA v2 / Visual Genome)
by treating them as distinct environments
- …